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Abstract — Finding tiie sparse solution of an underdetermined 
system of linear equations has many applications, especially, it is 
used in Compressed Sensing (CS), Sparse Component Analysis 
(SCA), and sparse decomposition of signals on overcomplete 
dictionaries. We have recently proposed a fast algorithm, called 
Smoothed £° (SLO), for this task. Contrary to many other sparse 
recovery algorithms, SLO is not based on minimizing the £^ norm, 
but it tries to directly minimize the norm of the solution. The 
basic idea of SLO is optimizing a sequence of certain (continuous) 
cost functions approximating the £ norm of a vector. However, 
in previous papers, we did not provide a complete convergence 
proof for SLO. In this paper, we study the convergence properties 
of SLO, and show that under a certain sparsity constraint in 
terms of Asymmetric Restricted Isometry Property (ARIP), and 
with a certain choice of parameters, the convergence of SLO 
to the sparsest solution is guaranteed. Moreover, we study the 
complexity of SLO, and we show that whenever the dimension 
of the dictionary grows, the complexity of SLO increases with 
the same order as Matching Pursuit (MP), which is one of the 
fastest existing sparse recovery methods, while contrary to MP, its 
convergence to the sparsest solution is guaranteed under certain 
conditions which are satisfied through the choice of parameters. 

Index Terms — Compressed Sensing (CS), Sparse Component 
Analysis (SCA), Sparse Decomposition, Atomic Decomposition, 
Over-complete Signal Representation, Sparse Source Separation. 



I. Introduction 

SPARSE solution of an Underdetermined System of Linear 
Equations (USEE) has recently attracted the attention 
of many researchers from different viewpoints, because of 
its potential applications in many different problems. It is 
used, for example, in Compressed Sensing (CS) HI, ||2l, 
||3| , underdetermined Sparse Component Analysis (SCA) and 
source separation ID, IS), ||6l, Q, atomic decomposition 
on overcomplete dictionaries JH], ||9l, decoding real field 
codes lilOJ . etc. 

Let X be a known n x 1 vector and A = [ai, . . . , a™] be 
a known n x m matrix with m > n, where a/s denotes its 
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columns. Then, we can seek the sparsest solution of the USLE 
As = X given by 

(Pq) : min ||s||o s.t. As = x, (1) 

where || • j|o is simply the number of nonzero components 
(conventionally called the norm although it is not a true 
norm). In atomic decomposition viewpoint, x is a signal which 
is to be decomposed as a linear combination of the signals a,;, 
i = 1, . . . , m, where a,;'s are called 'atoms', and A is called 
the 'dictionary' over which the signal is to be decomposed 

HD. 

A system A is said [:12j to satisfy Unique Representation 
Property (URP), if any n x n sub-matrix of A is invertible. 
It is known lfT2l . lfT3l . lfT4l that for any system satisfying 
URP, the solution to dTJ is unique, that is if the a solution So 
satisfying ||so||o < ?^/2 exists, then any other solution s has 
j|s||o > n/2. Therefore, under URP assumption, we can talk 
about 'the sparsest solution'. 

Solving ^ using a combinatorial search is NP-hard. Many 
alternative algorithms have been proposed to solve this prob- 
lem. Two frequently used approaches are Matching Pursuit 
(MP) ini and Basis Pursuit (BP) [SJ, which have many 
variants. MP is a fast algorithm but it cannot be guaranteed to 
find the sparsest solution. BP is based on replacing with the 
£^ norm which can be minimized using Linear Programming 
techniques. BP is computationally more complex than MP, but 
it can find the sparsest solution with high probability, provided 
this solution is sufficiently sparse lfT3l . lfT4l . ||2l, ifTSl . 

In lfT6l and ifTTl . we proposed an algorithm for solving 
([Til, called Smoothed (SLO), which provides a fast solution 
within a small Euclidean distance of the sparsest solution. 
The main idea was to approximate the norm by a smooth 
function (hence the name "smoothed £""). More precisely, 
||s||o is approximated by a continuous functiorQ m — Fa-{s), 
where cr determines the quality of approximation: the larger 
a, the smoother Fcr{-) but the worse the approximation to 
and visa versa. Hence, the solution tends to the sparsest 
solution when ct — > 0. Therefore, the objective underlying 
SLO is to maximize Fo-(s) (subject to As = x) for some very 
small value of cr. However, for small values of a, J^cr(-) has 
many local maxima and hence its maximization is not easy. 
Therefore, SLO uses a Graduated Non-Convexity (GNC) lill 
approach: It starts from a very large a (for which there is 
no local maxima), and gradually decreases a to zero. The 

'in this form, Fa{s) is an approximation to the number of 'zero's of s, 
that is, m — \\s\\q. 
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• Initialization: Set sq = A^x. Choose a suitable decreasing 
sequence for u: . . . crj]. 

• For j = 1, . . . , J: 

1) Let cr = (Ti. 

2) Maximize F^is) subject to As = x, using L iterations of 

steepest ascent: 

- Initialization: s = Sj_i. 

- For £ = 1, 2, . . . , L 

a) Let s -i- s + {ii,a'^')SJ Fc,(s). 

b) Project s back onto the feasible set {s|As = x}: 

_ s ^ s — A''"(As — x). 

3) Set Sj = s. 

• Final answer is s = sj. 



Fig. 1. Basics of the SLO algorithm 1171 . A^ stands for the Moore-Penrose 
pseudo inverse of A (\.e. At = A^(AA^)~^). 



maximum of Fcr(-) is used as a starting point to locate the 
maximum of for the next (smaller) a using a steepest 

ascent approach. Since the value of a has only slightly 
decreased, the maximizer of Fcr( ) for this new a is not too 
far from the maximizer of Fct(-) for the previous (larger) cr, 
and hence it is hoped that it does not get trapped into a local 
maximum. Figure [1] shows the basics of SLO algorithrr0. 

From Fig. [T] SLO consists of two loops: the 'outer' loop 
is the loop in which a is decreased, and the 'inner' loop is 
the one in which ^^(s) is iteratively maximized (subject to 
As = x) for the fixed choice of a. In ifTTl . we prove that ;/ 
the inner loop does not get trapped in a local maximum, our 
solution will converge to the solution o/ (|7} as cr — > in the 
outer loop. In other words, if cr is decreased so gradually that 
the GNC approach works and we have avoided local maxima 
in the inner loop, then our method will produce the desired 
results. 

However, a complete convergence analysis of SLO, as well 
as the choice of SLO parameters to guarantee avoiding local 
maxima in the inner loops remained to be shown. In particular, 
we want to know 1) the rate of decreasing of cr, 2) how many 
times we have to repeat the inner loop (the value of i), and 
3) how to choose /i in Fig.[T] In this paper, we present a com- 
plete convergence analysis of SLO for both noiseless and noisy 
cases, and we present parameter settings that guarantee SLO 
convergence to the solution of ([TJ. In contrast to exponential 
family of functions used for approximating the norm in 
ifTTl . the analysis here uses a family of spline functions for 
this aim. 

Note that, in practice, the values of SLO parameters that 
guarantee the convergence to the solution of ([T]) are not 
necessarily 'good' values. These values provide a theoretical 
support for the SLO algorithm, but they are often excessively 
pessimistic and result in slower convergence of the algorithm 
compared to a typical behavior (see also Section VI of ||9l). 

^Two other points in Fig.[T]are: 1) The initial guess for the sparsest solution 
is the minimum S? norm solution of As = x, which corresponds 1171 to the 
maximizer of -F<t(s) where a — > oo, and 2) The step-size of the steepest 
ascent is decreased proportional to cr^ (J_7]. 



A. Restricted Isometry and Overview of the Results 

The analysis is developed here using the Asymmetric Re- 
stricted Isometry Constants (ARICs) |[T9l, EqI, ||21], ll22l. in 
order to relate our work to £^-minimization. The asymmetric 
k-restricted constants 5™"^ and JJf are defined as the smallest 
nonnegative numbers satisfying 

(l-'5f")||s||2<l|As||2<(l + <5--)!|s||^ (2) 

for any s e E™ with ||s||o < k. 

Let So be the solution of ([T]) and ||so||o — k. We show that 
SLO recovers this solution provided that 

aS'^Z^+\\M\2<a (3) 

for any o; > 1, in which ||A||2 denotes the Euchdean norm 
of A, and \2ka] denotes the nearest integer greater than 
or equal to 2ka . More precisely, we derive a family of 
sufficient conditions for the performance of SLO that depend 
on parameter a. 

The ARICs are easy to calculate exactly for small scale 
systems, but the complexity grows exponentially as the scale 
grows. In fact, the value of ARICs depends on singular values 
of sub-matrices of the matrix A. Then, using the results of 
Il23l . 1241 . Il25l . mi, EOl, im, we analyze the behavior 
of SLO for large Gaussian random dictionaries. To achieve 
bounds similar to the existing ones for £^ minimization meth- 
ods, we use a popular result in Random Matrix Theory 1261 . 
|[27l, to derive Corollary 2] of Section HiH which can be viewed 
as SLO counterpart of Theorem 3.1 of 1281 . Specifically, we 
identify p{a) > 0, for any < ci; < 1, such that for large 
scaled satisfying n/m — > a and m — >■ oo, SLO can recover 
any sparse solution s with ||s||o < p{a)m from a (possibly 
noisy) measurement x. 

One of the bottlenecks of Compressed Sensing methods for 
handling large scale systems is the decoding complexity (see 
ITOl for the definition of encoding and decoding in compressed 
sensing context). In BP, decoding complexity is known to be 
m? Ill 0,1 , |,23J , or rri^-^n? for the cases where n is much smaller 
than m l29l , l30l . The coding complexity is mn. MP method 
has the smallest possible complexities for both encoding and 
decoding, which is mn BTl . For certain classes of systems, 
the complexity can be further reduced to mXogm l32l . In 
this paper, we will see (in Section IVI-CI ) that the coding and 
decoding complexities of SLO are similar to that of MP. 

Since ([T]) is NP-hard, one may wonder that proving conver- 
gence of SLO (with a complexity growing in quadratic with 
scale) means that NP = P. This is not the case. Note that in 
BP, too, the guarantee that BP will find the solution of ([T]i does 
not mean that NP = P, because such a guarantee only exists 
in the case of a very sparse solution. Our analysis possesses 
a similar limitation, too. 

The paper is organized as follows. In section [III assuming 
that the internal loop of Fig. [T] exactly follows the steepest 
ascent trajectory (in other words, we ignore the effect of /i 
and L, or implicitly assume that /i — > and L — )• oo), 
we analyze the convergence of the resultant (i.e. asymptotic) 

^By scale we mean the number of rows, n, and the number of columns, 
m, of the dictionary. 
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SLO. Indeed, in this section. Theorem |2] proposes a geometric 
a sequence which guarantees the local concavity of cost 
functions and the convergence of the internal loop of SLO 
to the true maximizer of and hence the convergence 
of asymptotic SLO to the sparsest solution. This sequence 
depends on the ARIP constants of the dictionary, which are 
not easy to calculate. Hence, in Section |III1 we discuss the 
behavior of asymptotic SLO in the case of large random 
Gaussian dictionaries. Corollary |4] of this section corresponds 
Donoho's results for £^ minimization. Theorem 3.1 of 1281 . 
In Section IIVI we consider the effect of no ideal /i, that is, 
where the internal loop does not follow exactly the steepest 
ascent trajectory, and makes discrete jumps in the steepest 
ascent direction. We provide a choice for /i which guaranties 
stability of the internal loop and convergence to the maximizer 
as L — )• OD. Then, after a discussion on the noisy case in 
Section|V] we derive a (finite) value for L in SectionlVTl which 
guaranties the convergence of SLO to the sparsest solution. 
This completes our convergence analysis of SLO. Further in 
Section [VT] (Theorem |7|, we study the complexity of SLO 
and prove that it is of order 0{m^), that is, the same as for 
MP, which is the fastest known algorithm in the field. Finally 
we address multiple sparse solution recovery with SLO and 
show that the order of complexity of SLO can be reduced to 
in this case. 

II. Convergence Analysis in noiseless case 
A. Basic Definitions 

In ifTTl . we first choose a continuous function 7^ that 
asymptotically approximates a Kronecker delta; 



lim /„(.s) 

o-->0 



1 ; if s = 
;ifs7^0 ' 



(4) 



and use it to approximate ||s||o by rn — Fc{s) where Fa-{s) = 
S"=i f<y{si)- Then, it is shown that under some mild condi- 
tions on /(t(-), maximizing Fcr{s) on As = x for a small a, 
using a GNC approach, will recover the sparse solution. To 
avoid being trapped into local maxima, one may wish to design 
a continuous concave function fa that can asymptotically 
approximate a Kronecker delta, but, taking into account the 
shape of any approximation to the Kronecker delta, this is 
not possible. However, we note that even for non-concave 
continuous functions, if the function is concave in the vicinity 
of the global maximum then by starting from any point 
sufficiently close the global maximum, steepest ascent will 
converge to the global maximum. In this section we investigate 
conditions under which F^ subject to As = x is concave near 
the global maximum, and how these can be used in designing 
a sequence of a that forces SLO to converge to the global 
maximum. 

Remark 1. Without loss of generality, we assume that the 
rows of A are orthonormal, i.e. AA^ = I„, where !„ stands 
for the n X 71 identity matrix. In effect, if the rows of A are not 
orthonormal, performing a Gram-Schmidt orthonormalization 
on the rows of A (and doing the corresponding operations on 
X, too) gives rise to an equivalent system of equations with 



the same set of solutions and with orthonormal rows of its 
dictionary. 

Moreover, for any matrix A with orthonormal rows, by 
expanding the set of rows of A, one can find a matrix 
D e m(™-")x™ such that Q = [A^,D^]^ is orthonormal. 
We note then that: 



AA^ 
DD^ 
AD^ 

A^A- 



= 

D^D 



(5) 



The rows of the matrix D are an orthonormal basis for the 
null-space of A. Moreover, for any s satisfying As = we 
have 

l|Ds|| =||Qs|| =!|s|l, (6) 

where, throughout the paper, || • || stands for the I? norm of a 
vector. 

Definition 1: Let tt^ : M™ i-> R be the projection of s = 
[si, • • • , s„i]^ onto the ith axis, i.e. 7ri(s) = s^. Moreover, let 
7r/(s) = (sji,--- ,Si^Y for / = {ii < 12 < • • • < v} C 
{1 • • • m} . Also let = {1 • • • m} - /. 

Example. For s = (2,3,4,7)"^ and / = {1,3}, we have 
^3(s) = 4, 7r,(s) = (2, 4)^, and (s) = (3, 7)^. 

Definition 2: For the matrix A we define: 



7A(?^o) 



max max 



]/|<no As=0 ||7r/c(s)||2 



max max 

|/|<no As=0 

max max 



l|s||^-|k,e(s)|l^ 



|^,C(S) 

-.112 



(7) 



/|<«o As=0 ||7r/c(s)P 



1, 



where |/| represents the cardinality of /. We will use 7(^0) ~ 
jA{no) notation whenever there is no ambiguity about the 
matrix A. 

Remark 2. Let null(A) = {s e M"'|As = 0} denote the 
null space of A. Then for any s e null(A): 



As = => A/S/ + A/eS/c 







|A/S/| 



A/eS/c 



(8) 

where A/ and A/c are sub-matrices of A containing columns 
indexed by / and J"^, respectively, s/ = 7r/(s) and s/c = 
7r/c(s). Now let crmin(') and (Jmnxi') stand for the smallest 
and largest singular values of a matri)(0. Then from (O and 



IIA/S/II > CTniin(A/)||s/|| 

max 

(A/c)||s/c 



we will have: 



7(no) < max '^"^--^^ 



(A/c) 



(9) 



(10) 



"*While it is common in the literature to define singular values to be strictly 
positive, in this paper, we use the definition of Horn and Johnson i33| pp. 
414-415], in which, the number of singular values of a p X g matrix M is 
fixed equal to min(p, q), and hence, the singular values of M are the square 
roots of the min(p, q) largest eigenvalues o/M^M (or MM^j. Using this 
definition, a matrix can have zero singular values; where a zero singular value 
characterizes a non-full-rank matiix. 
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By a similar argument: 

7(no) + 1 < 



c(A) 



|a: 



(11) 



where || • ||2 denotes the spectral norm of a matrix, that is, its 
largest singular value. 

Remark 3. 7(7^) < oo as long as A satisfies the URP. 
Observe that for any subset |/| < n, we have a^^J^Kic^ < 
oo. When A has the URP, the columns of A/ are linearly 
independent as long as |/| < n, and hence a'^^^{Ai) > 0. 
Then (fTol i implies that 7(71) is finite. 

Remark 4. 7(^0) is clearly an increasing function of no. 

Remark 5. Our definition of 7(^0) in ^ relates to the 
lower ARIC defined in (|2]i. From (fTTT l it is easy to see that for 



the ARIC (5^"'" 



satisfying ^ 

7("-o) 



1 < 



1 



|A||i 

no 



(12) 



Considering the existing upper bounds on the ARIP con- 
stants II20I , it is straight forward to find upper bound on j{no). 
We discuss the upper bound on 7(77,0) in section HIH 

Remark 6. For any 77 x n nonsingular matrix Q, the null 
spaces of A and QA are equal, i.e. {s|As = 0} = {s|QAs = 
0}. Therefore, 7a (770) = 7qa(77-o) for any value of uq. 

Remark 7. Gram Schmidt orthonormalization involves 
left side multiplication by a nonsingular matrix. Therefore, it 
does not change the value of 7. 

In ifTTl . we had used a family of Gaussian functions to 
approximate the £° norm. In this paper we use quadratic 
splines instead. The second order derivative of these splines 
is easy to manipulate and this simplifies our convergence 
analysis. 



Definition 3: Let 



denote a quadratic spline 



with knots at {+1, —1, 1 + 7, —1 — 7}, that is: 

" l-,sV(l + 7) 

(1^1-7-1)7(7^ 




-7) 



We also define 



and 



if \s\ < 1 

if 1 < |s| < 1 + 7 • 
if |s| > 1+7 

(13) 

(14) 



F^,,(s)4^^,.(s). 



(15) 



z=l 



In the rest of this paper, we use the notation F~f = F^^i. We 
also use Fcr = F^^a- whenever there is no ambiguity about 7. 
Remark 8. f~f and are both continuous, so that 



-2s/(l + 7) 



;if \s\ < 1 



, ■ 2.5/(72 +7) -2/7 ;ifl<s<l + 7 

^ 25/(7^ +7) + 2/7 ;if-l-7<s<-l ' 



and 



;if |s| > 1 + 7 



-2/(7 + 1) ;if|s|<l 

2/(7^ + 7) ;ifl<|s|<l + 7 
;if|s|>l + 7 



(16) 



(17) 



Definition 4: By |ls||o.cr, we mean the number of elements 
of s which have absolute values greater than a. In other words, 
||s||o,CT denotes the norm of a clipped version of s, in which, 
the components with absolute values less than or equal to a 
have been clipped to zero. 

B. Local concavity of the cost functions 

In this subsection, we show that F = F^^a defined in ( fTsT i, 
with 7 = 7(770), 77-0 < 77, and restricted to a certain subset of 
— {s G M^jAs = x}, is concave. Then, we show that this 
subset includes all points for which F > 770/ (1 + 7). 

Lemma 1: Lets denote F = F^,„, where 7 = 7(770) for 
77o < 77, and have A satisfy the URP. Let 5x = {s e 
M"'|As = x} and C be the subset of 5x consisting of those 
solutions that have at most tiq elements with absolute values 
greater than tr, that is: 



C ^ {S e 5x1 llsllo,,. < 770}- 



(18) 



Then the Hessian matrix of F\c, where F\c denotes the 
restriction of F on C, is negative semi-definite. 

Proof: Let the Unear transformation T : R™^" 5x 
defined by s = T{-v) = D^v + A^x for a constant x. T 
is clearly a linear isomorphism. Hence, instead of showing 
that the Hessian of F\c is negative semi-definite, we just need 
to show that the Hessian of G is negative semi-definite on 
r-i(C) C M™-", where G^FoT. 
Assume s G C. Clearly 

Hg(v) = DHf (s)D^, 

where v = r^^(s) and 

H^(s)=diag(/^_,(si),...,/;,(s„0) 

= ^diag(/;(si/a),...,/;(5,,„/a)). 

Let / be the set of indexes of those elements of s e C that 
have absolute values greater than a. From the definition of 
C, |/| < 770. To prove that Hg(v) is negative semi-definite, 
we have to show that u^D'^Hf(s)Du < for all u e R™. 
Defining w = Du, we have Aw = ADu = and, therefore, 
w e null(A). Next we show that w^Hi?(s)w < for all 
w e null (A). We write: 

w^H^^(s)w= -2^/;f(s,/a)7i;,2 



(19) 



By setting w/ and w/c equal to the sub-vectors of w indexed 
by / and F, from (|7J we have 

ilw/jp _ i|7r/(w)|| 

||w/c||2 ||7rj<:(w)||2 

and hence, using ( [TT] ): 
w"^Hi?(s)w < 



2 <7(|/|)<7(«o) = 7, (20) 



W/c 



W/ 



1 + 7 ct2 



- 7 (T^ 



<o, 
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which completes the proof. ■ 

Corollary 1: Under the conditions of Lemma [l] F = F^^^r 
is concave at every s e S, where S = {s G 5x|i^(s) > 
m — no/(l + 7)}. Moreover, the region ^ = {s e 5x|i^(s) > 
m — 7io/(2 + 27)} C S is convex. 

Proof: To prove the first part we show that S C C, where 
C is defined by[T8] Let s e 6 and / {1 < i < m | \s^\ > a}. 
Then ||s||o.(t = |/|, and hence, to prove s e C we have to show 
that |/| < uq. We write: 



V^ G / : > a ^ 1 - f{s,) > 1/(1 + 7) (21) 

m 

e 6 ^ > m - ^^(s) = ^{1 - f{s.)} 



(22) 



lei 



Substituting (|2T]i in (US, we obtain no/(l+7) > |/|/(l + 7), 
which completes the proof of the first part. 

To prove the second part, we consider si,S2 G A. By 
definition, at most ^ elements of Si and S2 can be greater than 
a. Hence, if we define s(t) = (1 — t)si + <S2, for < i < 1, 
at most no elements of s{t) can have absolute values greater 
than <T. We know s(t) = S2 — si G null(A), and the hessian of 
F is negative semi-definite on null (A) according to Lemma [T] 
Hence, if we define h{t) ~ F{s{t)), we obtain: 



(VF)^s = s^Hi^s < 0- 



Hence, h is concave on the [0, 1] interval, and for any < 
< < 1, we have 

F{s{t)) = h{t) > t ■ h{l) + (1 - t) • h{0) >m- no/ {2 + 27) 

. This implies that s{t) G A, hence A is convex. ■ 

Corollary 2: Under the conditions of Lemma [T] and the 
assumption that there exists a sparse solution sq satisfying 
k = \\so\\o < '^0/(2 + 27), by starting from any s satisfying 
F{s) > m — no/(2 + 27) and moving on the steepest ascent 
trajectory restricted to S^, we reach the global maximum of 
F\s^, satisfying F(s») > rn — k. More precisely, the solution 
of the differential equation 



satisfies 



a(0 = ^F\s^ 
a(0) = s 



lim a.{t) = s. 



(23) 



(24) 



Proof: From Corollary [T] we know that A = {s|i^(s) > 
m — no/ {2 + 27)} is a convex region. By starting from any 
point in a convex region and moving on the steepest ascent 
trajectory of a function which is concave on that region, we 
achieve the global maximizer in that region. Therefore, the 
steepest ascent trajectory leads to the maximizer G A. 
Using the assumptions on sparse solution, we have Sq G A. 
Hence the maximizer clearly satisfies -^(s^,) > -F'(so) > m—k. 



C. The narrow variation property 

In this subsection, we introduce a notion of the narrow 
variation property, which states that whenever the values 
of Fa at two points exceed a certain threshold, those two 
points are close to each other in the sense of the Euclidean 
distance between them being bounded by 0(771^/^7^/^(7). 
Before stating Lemma |2] we repeat Theorem 1 from ifTTl . 
This theorem states that if for each value of a we pick a point 
Scr on iSx such that F^i^cr) is greater than a certain value 
m — n + k, then the sequence of these points converges to the 
sparsest solution as cr ^ 0. 

Theorem 1: Consider a family of univariate functions f^, 
indexed by cr, cr G M^, satisfying the set of conditions: 

1) lim,^o/<T(s) =0 ; for all s^O 

2) /„(0) = 1 ; for all cr G R+ 

3) < fais) < 1 ; for all cr G M+, s G M 

4) For each positive values of ly and a, there exists ctq G 
R+ that satisfies: 



|s| > a => /(j(s) < I' ; for all a < ao- 



(25) 



Let Fa{s) = /o-(si)- Assume that A satisfies the URP, 

So G iSx satisfies |jso||o = k < n/2 and s^- G Sx satisfies 
Fa{sa) > m — n + k. Then 



lim S(j = Sq. 

cr— i-O 



(26) 



Remark 1. Note that the conditions on A in Lemma [T| are 
the same as in Theorem [T] and ^ defined in ( fT4l i satisfies 
all the conditions 1 to 4 of Theorem [T] for any arbitrary value 
of 7. 

The main idea of the following Lemma |2] (and its proof) 
is very similar to that of Theorem [T] We prove that if o- 
values at two points Si and S2 in 5x are larger than m — 
no/ {2 + 27), then the distance between Si and S2 is bounded 
by 2^771(7 + l)cr. 

Lemma 2: Let F ~ F^ ^ where 7 = j{no)- If for two 
points Si and S2 of we have: 

F{s,)>m- i = l,2, (27) 



2 + 27' 



then: 



||si - S2II < 2y/m{j+l)a. (28) 
Moreover, if S2 = Sq, we have a slightly stricter bound 

||si - Soil < ^/m{'y + l)a- (29) 

Proof: The argument is similar to that of Lemma 1 
of ifTTl . but made a bit more rigorous. Having in mind the 
proof of the first part of Corollary [T] observe that dZTl i implies 
that Si and S2 have at most no/2 elements with absolute values 
greater than a. Hence, Si — S2 has at most 77,0 elements with 
absolute values greater than 2cr. Let / index those elements of 
Si — S2 with absolute values greater than 2cr. Then |/| < no 
and 

lk/<=(si - S2)||2 < |/"|(2a)2 < Ama^. (30) 



From ( I20I 1 and ( |30] |, we get 

||7r/(Sl - S2)|P < 4?77Cr^7 



(31) 
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and 



|si-S2f <4ma2(l + 7), 



(32) 



which yield (I28b . If S2 = sq, we can conclude that si — S2 has 
at most no elements with absolute values greater than cr, and 
hence 

\\si-sof <ma^ {!+-/). (33) 



D. Bounded variations of cost functions 

Our cost functions have a nice property which does 
not, i.e. they are continuous. In Lemma |3] we show that the 
derivative of / is bounded, and as a result, small changes in 
s result in small changes in F{s). 

Lemma 3: For / = f^^a- and F — F^y. 



(l+7)a 



and 



\F{s,)^F{s2)\ < ^ 



2^ 



(34) 



(35) 



for any s e M and Si, S2 £ R"\ 

Proof: ( l34b is a straight forward conclusion from ( fT6b . 
To prove (|35] |. note that for any s e M™ we have 



where Vi^ denotes the gradient of F : M™ -> M. Moreover, 
using the mean value theorem, for any Si and S2 there exists 
a s e M™ such that 



^^(si) - ^^(82) = V^^(sf (si - S2) 



(37) 



Therefore: 

|F(si)-F(s2)| = |VF(s)^(si -S2) 

< ||VF(s)||2-ilsi-S2|| < 



2^. 



(1+7)(T 



Si - S2 



(38) 



E. The choice of parameters of the algorithm 

At this point we have acquired the necessary tools for 
designing a sequence of a values needed to successfully 
maximize F^^^. The question remained to be solved is how, 
after finding the global maximum of i^^ ^ for some value 
of a, we choose the next value of a so that we are guar- 
anteed to be in a (locally) concave area. More specifically. 
Lemma 12] ensures that by starting from any point s satisfying 
F^.a{s) > m — no/(2 + 27) and following the steepest ascent 
trajectory of F^,„, we end at the global maximum s, of ,j 
satisfying F^^a-{s*) > m — k. The question we study next is 
how, knowing F-y.(j(s,) > m—k, can we choose the next value 
of cr' subject to F-y.(j'(sH.) > m — no/(2 + 27). In Lemma|4]we 
present a constant c, for which cr' = ccr satisfies this condition. 



Lemma 4: For constants B > A > 0, let's define 

A 2m 



(39) 



2m + B- A 
Then we have the following result: 

If F^^^{s) >m- A, then F^.cct(s) >m-B, (40) 

for any s e M."\ Moreover 



(1 + 7)^2 
Proof: For iAli note that: 

f^is/a) > 1 - sV(l + 7)a2 ^ F^^,(s) > m - 



(41) 



(1+7^' 



Let's define: 

n 

"(0 - ^7,(T/(l+t)(s) = F^,a{s + si) = ^ f'y.aiSi + S,t) 

i=\ 

for t > 0. Having |/^.^(s)| < 2/(1 + 7)0- from dUi, and 
f!y g.{s) = for |.s| > (1 + 7)ct from ( fTSI l. we will have 

d ^ d 



i=l i=l 

= ^ |s,|-|/;,(s, + s,0)|<2m- 

Si|<CT(l + 7) 



(36) Hence, by choosing to = {B — A) / (2m), we have 



\a{to) - a{0)\ < to\-a{t)\ < B - A 

for some t > 0. Then, choosing c = 1/(1+ to) in ( l39l ), we 
have 

|^^c.(s) - F,(s)| = |a(io) - a(0)| < B - A, (42) 

which leads to ( l40l ). ■ 
Using Lemma H] the following theorem states a sufficient 

condition for the convergence of an asymptotic version of SLO, 

in which the steepest ascent follows exactly the steepest ascent 

trajectory {i.e. the case /i ^ and L 00). 

Theorem 2: Assume A satisfies the URP and / is as 

defined in (T3[ . and also k — ||so||o < no/{2 + 2j). Let 



s = argmm^g^^ ||s|| = A^x and: 



CTl 



2m 



< 1- 



(43) 



(44) 



2m + no/(2 + 27)-fc 

If we choose the geometric sequence of a according to cr^+i = 
c(7j, and set Si = s in the first step, and in each subsequent 
step, i.e. j > 2, start with Sj_i and move on the steepest 
ascent trajectory of F^. to reach the maximizer Sj, then at 
each step: 

Fajisj) >m-k 



and 



lim Sj ~ Sq. 
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Proof: By induction on j. First note that by substituting 
CTi defined by (|43]i in (gT]), we have Fo-i(si) = i^cri (§) > 
m—k. Moreover, by substituting c defined by ( l44l i in Lemma|4] 
we conclude 



i^cr(s) > TO - fc i^c<T(s) >Tn- 



no 



2 + 27' 



(45) 



for any s G M™. Now, to complete the induction, assume 
Fo-j. j (sj-i) > m — fc. Then, from 



"0 



2 + 27 



(46) 



Therefore, according to Lemma |2] the which is achieved 
by starting at Sj_i and following the steepest ascent trajectory 
of i^cTj , satisfies F„. {sj ) > m — k. 

To prove the second part of Theorem |2] note that aj 
as j — 7> 00 (since c < 1) and m—k > m — n + k (since 
k < no/ 2 < n/2), hence the sequence of Sj satisfies the 
conditions of Theorem [T] The same conclusion also follows 
from Lemma 12] since Fcr{sj) > m — k > to — ?io/(2 + 27) 
results in 

(47) 



II - Soil < VM7 + l)crj 0- 



Remark 1. Theorem |2] proves the convergence of an 
asymptotic version of SLO, in which the internal loop steps 
precisely along the steepest ascent trajectory. This corresponds 
to /i ^ and L — !• od in Fig. \T\ We will discuss later in 
Section |IV] the case of /i > (discrete steps in the steepest 
ascent directions), and propose a value for /i which guarantees 
the convergence, provided that the internal loop is repeated 
until the convergence is achieved (corresponding to L — > 00). 
Finally, Section IVTl proposes a value for L that guarantees the 
convergence and that completes the convergence analysis of 
SLO. 

Remark 2. In ifTTl (Remark 5, section III) we heuris- 
tically justified that ai should be chosen proportional to the 
maximum absolute value of elements of s, i.e. max; |si|. This 
choice is now better justified by Theorem |2] Eq. (03]). 

Remark 3. In Experiment 2 of ifTTl we had observed that 
the value of c depended on the sparsity (fc) of the solution, and 
not as much on any other parameter of SLO (see Fig. 3 of IfTTl ). 
The optimal value of c grew with increasing k and tended to 
1 as fc n/2. Equation ( l44l i supports this observation, as the 
value of c depends only on the value of k (and of course, the 
system scale), and c — > 1 as A: — > no/2(l + 7). 

Corollary 3: Asymptotic SLO (when /i and L 00) 
converges to the sparse solution if 



'\2ka] 



IAII2 < a, 



(48) 



is the 



where k = ||s||o, a > 1 is an arbitrary constant, 5- 
lower ARIC, and ||A||2 denotes the spectral norm of A. 

Proof: If ( |48] l holds, by setting no — \2ak^ and using 
( fT2] i. it is easy to see that 7(^0) + 1 < a. Hence, the condition 
of Theorem |2] i.e. ||s||o < no/(2 + 27(no)), holds and the 
convergence is guaranteed. 



III. Large Random Gaussian Matricies 

Our sparsity constraint for successful recovery of the sparse 
solution is of the form k < no/{2 + 2j), where 7 = 7(^0) 
depends on the matrix A. It is not practical to precisely 
calculate 7(no) for large scale systems since computational 
complexity grows exponentiall}|5 However, in the case of 
random Gaussian matrices we can find reasonable almost 
sure (a.s.) upper-bounds on 7(71.0), which make it possible 
to compare our results with the ones for £^ -minimization 



J. In this section we assume that A has 
independent identically distributed (i.i.d) entries drawn from a 
normal distribution with zero mean and variance 1/n. 

We use Theorem 11.13 of 1271 . Let G be an Z x n random 
matrix with i.i.d. entries drawn from a N{0, 1/n) distribution. 
We are interested in singular values of G, or equivalently, 
eigenvalues of G^G, and, in particular, the smallest and the 
largest one. In 1271 . l26l . authors prove that 



' {an,ax(G) > 1 + ^/TJ^ + r} < cxp(-nrV2) 



and also 



|a„i„(G) < 1 - < cxp(-nrV2)- 



(49) 



(50) 
It is 



They prove the above inequalities for the case I < n. 
not difficult to check that ( |49] ) holds for the case Z > n as 
well, since from definition of G, \prijl Q,^ is an n x Z normal 
distributed matrix with variance \/l. In this case, we can use 
(|49] | to conclude that 

F {crmax( VW^G^) > 1 + ^J^fl + r} < cxp(-/rV2)- 

Noting that <JmB.-J\\/ n/l G^) = n/l crrnax(G) and setting 
r' — r\Jl/n we get the desired result. 

In the following theorem, using arguments similar to ones 
used for bounding the symmetric and asymmetric RICs l23l . 
ED, II25I . II20I . we prove that with high probability the value 
of 7 is bounded. 

Theorem 3: If A is a random Gaussian matrix with 
i.i.d. zero mean entries of variance 1/n and if a = n/m and 
/3 = no/m are fixed, then 



7(no) > 



< 



which tends to zero as to 
r > ro where 



(1- 

00, provided that e > and 



exp(-nrV2 + nrl/2) + cxp{-ne^ /2), (51) 



ro^ v/2/3/"log(e//3) 



(52) 



and e = cxp(l) denotes the Euler's constant (the base of 
natural logarithm). 

Proof: Let / be some subset of {1, • • • , m} with |/| — no. 
Then, A/ is no x n and 



> {crmax(A) > 1 + s/^i + e} < exp(-neV2) 



(53) 



^Even a deterministic upper bound on 7 using 112) is not practical. Tlie 
upper bound depends on Euclidean norm of A and the lower ARIC. Precise 
calculaion of ARIC requires enumerating all possible no-column submatrices 
of A and computing their smallest singular values. 
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and 



' {amin(A/) < 1 - v/W^ - ^} < cxp(-nrV2) (54) 



for any subset |/| = no- There are a total of (™) such subsets, 
which means 



min fTniin(A/) < 1 - ^Jn^ln - r I < ( ) < 



Then, using ( fTTI ) we have 



(55) 



7(no) > 



(1 + ^mjn + ef 
(1 - ^n^jn - r)2 



< 



no 



exp(-nrV2) + exp(-7ieV2)- (56) 



From 



< (— j ^ cxp ( no log(TOe/no) ) (57) 



we get 



exp (no log(7ne/no) - nr^/2 ) + exp(-ne^/2)- (58) 



If we assume a = n/m and /? = no/m are fixed, then by 
defining tq as in ( |52] |. we obtain (fSTT l as m — >^ cxj. ■ 
Corollary 4: Let's define 7(a,/3) as follows: 



7(a,/3) 



(i + v/T7^)2 



1- vW^- v/2/3/alog(e//3) 



if 1 - \fWa - ^2/3/alog(e/^) > 0, and otherwise 
7(a,/3) — > +00. Let also 



0</}<Q 2 + 27(0;, /3) 

Then, p{a) > for any a > 0. Moreover, we can guarantee 
that for almost every large system with ratio n/m —s- a, the 
asymptotic SLO can recover the sparse solutions satisfying 
||s||o < p{a)m. 

Proof: To show p{a) > 0, simply note that 



lim 



/3 



0^ 



(60) 



/3-^o+ 2 + 27(a,/3) 

For the second part, it suffice to apply Theorem [5] with no = 
[/3*?n], where [3* is the value of /3 that maximizes 7(0;, /3) in 



IV. Stability of the internal loop and its 

EXPONENTIAL CONVERGENCE RATE 

From Fig. [T] the steepest ascent steps in SLO are of the 
form: 

s,+i = s, + Ata^D^DVf^ls. (61) 

where D^D is the orthogonal projection on null(A) and /i 
is the step size parameter Until now, we have considered 
convergence of what we refer to as asymptotic version of SLO 



(corresponding to p, ^ and L 00), in which the steps of 
the internal loop of Fig. [T] follow exactly along the steepest 
ascent trajectory . In this section, we study how to choose 
the parameter /i. For this part of the analysis, we assume the 
internal loop is repeated until convergence (corresponding to 
L 00). 

Lemma 5: Let F = F^i where 7' > 7 = 7(no) and 
cr > is arbitrary. Let also A„iin and Amax denote the smallest 
and largest eigenvalues of — Dcr^Hi?(s)D-'^ respectively (note 
that the values of Xn^m and A„iax depend on s). Then for all 

s e R" 



and for all s G -4 



-^min ^ 



1 + 7 



2(7' - 7) 



(l + 7)(7' + 7'') 
Proof: For convenience, let's define 

2 A 2(7' -7) 



Ai 



1 + 7' (l + 7)(7' + 7'')' 



so that we need to show that 



^max ^max' 



A. 



> A' 



(62) 



(63) 



(64) 



(65) 



We know that for any matrix M with maximum and 
minimum eigenvalues Amax(M) and A,„i„(M), M — AI is 
positive semi-definite if and only if A < Amin(M). Moreover 
M — AI is negative semi-definite if and only if A > A,nax- 

To prove (|65ll, we show that D((t2Hf(s) + X'^^^I)!)'^ is 
positive semi-definite for all s e R™, and D(cr^Hi?(s) + 
A[jjijjI)D-^ is negative semi-definite as long as s e A. 
Following steps of the proof of Lemma [T] the former follows 
from 

v,^(a'UF{s) + X'^,j]v^ > (A;_ - T^)l|w|r > 0. 

(66) 



1 
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(59) To show the second assertion, from ( |20] | we obtain 



I|w/|p/||wj|j2 < 7. Then, from (O we have 
w^(a2Hf^(s) + AV„l)w<(A;„ 



1 + 7' 
7"^ + 7 



W/c 



< 0- 
(67) 



^I)D^ and T>{a^llF{s) 



jjjaxi)'-' are negative and positive semi-definite respectively. 



Hence, Bia^Upis) + X'„ 
A' TM^T 
and (l65Tl holds 



Theorem 4: Let F = F^i^a, where 7' > 7 = 7(no) 
Suppose also that: 

no 



F{si) >m- 



2 + 27 



Then, by setting 



M — 2/ (Ajjjijj + Ajjjax)) 



(68) 



(69) 



where AJ^^x and AJ^^jj^ are as defined in (l64l ). it is guaranteed 



that 



opt 1 1 5 



(70) 
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where Sopt is the maximizer of F on 5x, s^+i is as defined 
in dM), and CR' ^ (A;„,, - A^^. J/(A:,,,, + X[^^J determines 
the convergence rate. Moreover: 



(71) 



Proof: The proof consists of the following steps. 
Step 1: From ( l68T l, e ^ and Sop4 e ^, where ^ is as 
defined in Corollary [T] From Corollary [T] ^ is convex and F 
is concave on A. Hence, Sopt satisfies: 

Sopt = Sopt + AifT^D^DVFU,, ^ DVi^|s„,, = 0. (72) 

Subtracting ( |72| | from (l6Tl i. we have 

s^+i - Sopt = s,- Sopt + ^cr2D^D(VF|,.-VF|s„^J. (73) 

Multiplying by D and setting DD-^ = I, we get 

D(s,+i - Sopt) = D(s, - Sopt) + Aicr2D(VF|,. - VF\^^J, 

(74) 

From the mean value theorem, there exists ate [0, 1] such 
that s' = tSopt + (1 — t)si satisfies: 

D(si+i - Sopt) = D(si - Sopt) + Mo-^DHi?(s')(si - Sopt). 

(75) 

Since {si,Sopt} € A, it means that s' S ^. Also, since (s.; — 
Sopt) G null(A), it is equal to its projection to null(A), that 
is. Si — Sopt = D^D(si — Sopt). Therefore, the above equation 
can be written as 



D(s,+i-Sopt) = I + /^a'DHp-(s')D^ p(s,;-s, 



'opt J 



(76) 



Since (s^ — Sopt) and (s^+i — Sopt) are both in null(A), from 
we can write 

||s,+i-Sopt|| < \\I + ^la^^DIlF{s')^D^\\2■\\s,-Sopt\\■ (77) 
Step 2: Let's define the Rate of Convergence (CR) as 
CR = ||I + /icr2DHF(s')D'^||2 

= max{|l - /iAminI, |1 - MAinaxI} 

= max{l - fiXnun, -1 + A^Aniin, -1 + Mmax, 1 " ^Vax} 
= max{l - flXniin, -1 + A^Amax}, 

(78) 

where A,„in and Amax are the smallest and largest eigenvalues 
of — Do'^Hi?(s')D-'". The value of /i that optimizes CR is 
/i = 2/ (Amax + Ainin), which rcsults in 



CR = 1 2- 



Aniir 



K - 1 



-, (79) 



Amax Amin Amax ^" Amin K -\- \ 

where k — k(— fT^DHi?(s')D^) — Amax/Amin denotes the 
condition number of matrix D. With this definition, we have 



jsj+i - Sopt 1 1 < CR||s, - s, 



opt I 



(80) 



Computing Amax and Amin in each step is not practical for 
large scale systems. Instead, we can find bounds on their values 
using ( l65T l. Of course this bounds do not depend on s'. 
Choosing fj, according to ( l69l l and considering dTSl l. we have 



||I + ^ta2DHi.(s')D^||2 < ^i"'^" = 

Amax ^~ Amin 

Taking ([SB together with ^J}, we obtain ( iTOl i. 



CR'. (81) 



Step 3: From the second order Taylor expansion of F 
around Sj we have 

F(s,+i) - F(s,) = (s,+i - s,)^VF 

+ - Si)'^HF(Sj+l - Si) 

where VF = VFjg. and = Hi?(s"), for some point s" 
satisfying s" = ts^ + (1 — i)si+i for some < t < 1. Then, 
by substituting s^+i — S; from (1611 1 and factoring we get 



F(s,+i) -F(s,) = 

2 2 

^^VF^D^fcr2DHFD'^ + (2/M)l)DVF. (83) 



From ( |69] l and ( 1651 ) we have 

Amax < 2/^- (84) 

Now dTTT i is a straightforward conclusion of ( |83] ) and ( l84b . 

■ 

Remark 1. The value of 7 < 7' < {no/2k) — 1 should 
be chosen carefully. If 7' 7, then X[^-^^^/ X[^^^^ 00 and 
CR' 1. If 7' (no/2fc) - 1, then c 1 in gli, and the 
computational cost tends to infinity. In Section [VTl we discuss 
how to choose 7' to have a reasonable convergence. 

Remark 2. Theorems |2] and |4] prove convergence of SLO, 
provided that the internal loop is repeated until convergence is 
reached. The question remains to be answered is how to select 
the value of L to guarantee that the internal loop is repeated 
until convergence is reached. This question is answered in 
Section |VT] 

V. The noisy case 

Thus far we discussed the convergence and stability of 
SLO in the noiseless case. Theorem 3 of (T7\ states that the 
maximizer of F^ is a good estimator of sparse solution even 
in the noisy case. In this section we investigate the choice of 
parameters that assure local concavity and, hence, convergence 
of SLO when data contains noise. 

The following theorem is a modification of Theorem 3 
of IITtI and it provides conditions for convergence in noise. 

Theorem 5: Let 5^ = {s| j|As — x|| < e}, where e is 
an arbitrary positive number, and assume that matrix A and 
functions f^r satisfy the conditions of Theorem |2] Let Sg € 
be a sparse solution. Assume the condition fc < no/(2 + 27), 
and choose any k' satisfying fc < fc' < no/(2 + 27). We also 
choose the first term cti and the scale factor c according to 



0"! 



2m 



2m + no/(2 + 27) 



< 1 



(85) 



(86) 



and set aj ~ aic' ^ , I < j < J, where J is the index of the 
smallest term of the <t sequence satisfying 



^ 2V^j|A||2e 



(87) 
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Then, following the steps of asymptotic SLO and terminating 
at step J, one can achieve a solution within the distance Ce 
of the sparsest solution, where 

Proof: Let n = Aso — x. Then, so £ means that 
|n|| < e. Defining n = A^n, we have 

X = Aso+n = Aso+AA^n = Aso+An = A(so+n) = As, 

where s = sq + n. Let be the maximizer of Fa on 
As = X, as defined in Theorem 1 of iflTl . Note that, Sa- is 
not necessarily the maximizer of Fa- on the whole Se- The 
argument is similar to that of Theorem 3 in flT]. From ( |35] | 
in lemma [3] and dSTl i. we have 



Is- Soil = ||n|| < ||A||2e 

\Fa^i~s)-FaAso)\< 



C7,(l+7) 



|s-So|| <k' -k- (89) 



Hence 



Faj (so) > m - k =^ Fa-{s) > m - k' ■ 



(90) 



The vector Sq does not necessarily satisfy As = x, however, 
we have chosen s to be the projection of Sq onto the subspace 
As = X. Hence, s satisfies As = x. Moreover, Fa{s) > 
m — k' > TO — no/(2 + 27), hence s € ^, and by optimizing 
Fa from an arbitrary point in A we are guaranteed a solution 
s, for which Fa{s^,) > m — k'. Now, using Lemma |4] it is 
easy to conclude that for ai and c chosen according to (l85l l 
and ((Mil, 

Fa,is)>m-k' (91) 

and 

^;,(s) > TO - fc' =^ Fca{s) > TO - no/{2 + 27). (92) 

Following the steps of the proof of Theorem |2l but with the 
sparsity factor k replaced by k', we can conclude that 



Fajisj) >m-k'- 
Using Lemma |2l ( |90b and ( |93] l, we then have 

4TO||A||2e 



||sj-s|| < 2Vm(7 + l)aj < 



c^/T+7(fc' - k) 



and 



|SJ - Soil < ||S,7 - S|| + ||S - Sol 

4TO||A||2e 



< 



cy/TTi{k' - k) 



\A\Ue = C€- 



(93) 



(94) 



(95) 



Remark 1. If k' k, the error bound tends to infinity 
in dSSb . If k' no/ {2 + 27), the computational cost would 
tend to infinity as c ^ 1 in ( l86b . Hence k' should be chosen 
suitably between the two values. A simple sub-optimal choice 
is presented in the next section. 

Remark 2. In Theorem 3 of IfTTl , we proved that 
by suitably choosing a proportional to the noise level, we 
can bound the Euclidean distance between the maximizer of 
Fa and the sparse solution by order of the noise standard 



deviation. Experiment 2 of ifTTl (Section IV, Fig. 4) confirmed 
the result of Theorem 3 of IHl. Here, ^ and ^ also 
confirm this result. As can be seen from dSST l, the estimation 
error depends linearly on the system noise. 

VI. Finalizing the convergence analysis 

At this point we have acquired all the tools necessary for 
ensuring the convergence of the external loop, stability of the 
steepest ascent (internal loop), and robustness against noise 
for SLO. The only parameter we have not yet discussed is 
L (the number of iterations of the internal loop shown in 
Fig. [Til. In this section, we put all the previous results together 
and provide values for all the parameters that are sufficient to 
guarantee successful convergence of SLO. 

We present results for three cases. In the first case, we 
assume that suitable values of no and 7 = 7(^0) are known, 
such that ||so||o < no/ {2 + 2j). In this case, the values of the 
parameters that guarantee the convergence are summarized in 
Fig. |2]and the convergence is proved in Theorem |6l 

In the second case, 7 is assumed unknown and we consider 
a large Gaussian matrix A, and use the almost sure results of 
Section|lII]to determine uq and 7. The values of the parameters 
for this case that guarantee convergence are summarized in 
Fig. [3] For a random matrix A with i.i.d and zero-mean 
Gaussian entries, Theorem|7]shows that using these parameters 
the sparse solution of As = x can be found with probability 
approaching 1 as the size of the system grows, as long as 
||so||o < p{a)m. Moreover, it is shown that the complexity 
of SLO grows as to^, which is faster than the state of the art 
m^-^ associated with Basic Pursuit and is comparable with 
Matching Pursuit. 

The third case deals with multiple source recovery where 
the sparsest solutions of multiple USLE's with the same 
coefficient matrix are recovered at once. Multiple source 
recovery may be viewed in the context of SCA ||4| for 
Blind Source Separation. In Experiment 6 of ifTTll we 
observed that implementing SLO for multiple source recovery 
in matrix multiplication form can make it faster than the SLO 
algorithm for single solution recovery. Theorem [8] shows that 
this approach can speed up SLO to the order of m^-^'^^. 

A. Case of known 7 

Putting the results of previous sections together, the follow- 
ing theorem shows that if the values of the parameters are 
chosen as summarized in Fig. |2l then SLO will converge to 
the sparsest solution. The proposed value for L can be seen in 
the step [17] of the figure. Note also that the notation of Fig. [T] 
has been changed slightly in Fig. |2]to match the convergence 
proof given next. 

Theorem 6 (The case of known uq and "/): Let 7 = "/(no) 
and, without loss of generality, assume matrix A has or- 
thonormal rows. Let x = Aso + n for some ||n|| < e and 
llsollo < fc < no/2(l + 7). Let A ^ "°/^^]+'^)~^ Then the 
algorithm given in Fig. |2] can recover So within a distance 
5 > Ce, where 
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Initialization: 

1) "o/2(i+7)-fc 

2) k' k + mA 

3) k" ^ k + 2mA 



4) 



"0 



k + 3mA (i.e. ■y' 



2(1+70 ^ 

5) F ^ pI^, 

6) 5' ^ 5- ||A||2e ^ 

7) ai ^ |jA^x| i/Vno/(2 + 270 

8) a J ^ <5'/2v'm(7' + 1) 

q\ 7- . r lpg(°"i)-l°g('^.7) i 1 1 
^) -J ^ \ iog(l+A/2) I ^ 

10) log(c) ^ _ l°g('^i W°g('^.7) 

11) o-j 4- cricJ-l{l < i < J) 

12) -^max l + y 

n'l X' — 2(7' -7) 

min (l + 7){7'^+7') 

14) M ^ 2/(A;^i„ + a;,,,,) 

15) ^' ^X'^^JX'^.^ 

16) CR' 

I7X r , r -"log(A/4)-l/2 1og(7' + l) -| 
-log(CR') I 



2{fc+3mA) 



1) 



1 



• For j = 1 , . . . , J: 

1) (T ^ (7j . 

2) If j > 2, Sj.i <- Sj_i,i. If j = 1, si,i ^ A^x 

3) For / = 1, . . . , L - 1: 

• Output is Sont sj 



where the second inequaUty holds when the value of L is 
defined as in the Step [TT] of Fig. |2] Hence, from Lemma [3] 



-p:\\^j,L ~ ^opt\ 



TO A. 

(103) 



C7,(l+7') 

Therefore, from dlOOb and ( 1103b we have 

Fa^ (sj- l) > F^^. (sopt) - mA > i^,,^ (s) - toA > m - fc". 

(104) 

Step 3: We show that if F^._-^{sj_i_L) > m — k", then 

^^a,(sj,i) > m-no/(2 + 27')- (105) 
From the algorithm of Fig. |2] we know that 

1 2m. 



c > 



(106) 



Fig. 2. The SLO algorithm for the case of known no and 7 (no) and A with 
orthonormal rows , with parameters shown that guarantee convergence to the 
sparsest solution, ; is the solution estimate at the con'esponding iteration. 



Proof: The proof is constructed using the following steps: 
Step 1: Let's set s = Sq + A^n, then we have s e 5x and 
also Fcr{s) > m — k' for any a > aj. Assume that we have 



l + A/2 2to + toA 

Then, choosing A = k" = k + 2m.A and B = no/{2 + 2j') ^ 
k + 3mA in Lemma|4]and substituting Sj 1 = Sj_i,i, we have 

(sj-i.l) >m- k" F^^ (sj, 1) >m- 7io/(2 + 27')- 

(107) 

Step 4: Here, we prove by induction on j that Faj{sj.L) > 
TO — k" . In the first step, we have Si.i = A-^x and 

ai = ||A^x||/v/no/(2 + 27')- 
Hence, from Lemma |4] 



(108) 
(109) 
(110) 



a > a 1 



5' 



2vM7+T) 

Then, from Lemma |3] we have 

|f.(so)-f.(s)|< y" ||so-si| 
(1 +7')cr 

Since j|so — s|| < ||A^j|2 • ||n|| < ||Aj|2e and 



lAII.e, 



(97) 



(98) 



(99) 



VtTTA^ 

from ^ and ^ we have 

|F^(so)-F^(s)| < toA ^ i^^(i) > i^^(so)-TOA = TO-fc'. 

(100) 

Step 2: We show that for any 1 < j < J, if ^^^ (8^,1) > 
TO — 2+2-y' ' '■^^^ ^<^3 (^i.i) — iTi^k" , where the notations Sj^i 
and k" are defined in Fig. |2] Let Sopt be the maximizer of 
F^^ on 5x. Hence, F„^{sopt) > Fa^{sj,i) > 
from Lemma |2] 



||sj,i - SoptW < 2V'm(7 + l)cr 
From dTOl ). we conclude 

|!s,.L-Sop*|| < (CRO^s.-i-Soptll 

AVyTT 



(101) 



< 



< 



_4 
TOAcr_,(l + 7' 



(2v/m(7' + l)crj) (102) 



^^ai(si,i) > TO-no/(2 + 27'), 
and from Step 2 

^^.i(si.L)>m-fc"- 
Assume that 

^^a,_i(s,-l,L)>TO-fc" (111) 

for some j. Then from the results of Step 3 and noting from 
Fig 12] that Sjj = Sj_i,i, we obtain 

F,^.(s,-i) =F,,(sj-i,l) > TO-no/(2 + 27') (112) 
and from Step 2, 

(s,,l) > TO - fc". (113) 
We can conclude then that 

Faj{sout) = Fa,{sj,L) > m-k" > TO-no/(2+27'). (114) 

Step 5: From Lemma |2] (1114b . (1 100b and the choice of ctj 
given in step 8 of Fig. |2] we have 

||g-So„t|| < 2v/TO(y + = <5' (115) 

and 

||so-So„t|l < |ls-So„t|l+|lso-s!l <(5'+|lA|l2e = (5- (116) 

This completes the proof of convergence of SLO. ■ 
Remark 1. In noiseless case (e = 0), SLO can recover 
the sparsest solution within a distance S, for some (5 > 0, in a 
finite number of steps. But as (5 ^ 0, crj, i.e. the last value of 
cr, tends to zero according to step 8 of Fig. |2] and J tends to 
oo according to step 9. Hence, the complexity of the algorithm 
tends to infinity. 

Remark 2. Note that the algorithm does not require 
the exact value of the £° norm. Only an upper bound k is 
necessary. 
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Initialization: 

1) 13* <- maximizer of /3/(2 + 27(0, /9)) on < /3 < a 

2) 7^7(a,/3*) 

3) no <- r^*m] 

4) fc •<— [rm] 

5) 5 ^ C'e, where C" is defined in (117) 

6) (71 <— (1 + ■^/o){l + y/a + e). (This step replaces step 7 of 

Fig. ID. 

7) Do initialization steps 1 ■ ■ ■ 6 and 8 ■ ■ ■ 17 of Fig. |2] 



Fig. 3. SLO initialization parameters for the case of unknown 7. Step 6 here 
replaces step 7 of Fig. [2] 



B. Case of unknown 7 

For a large Gaussian A, we can use the a.s. results of Sec- 
tion |III] to find no and 7(^0), and thus obtain the initialization 
of SLO shown in Fig. |3] The following theorem guarantees 
convergence of the algorithm in Fig. [3] 

Theorem 7 (the case of unknown uq and j): Let A be an 
n X m Gaussian matrix, and n/m — >Q!>Oasm— )-(X). 
Lets fix r < p{a) and let P„i denote the probability that the 
algorithm in Fig. [3] can recover any Sq from x = Asq + n 
within Euclidean distance of S ~ C'e, as long as ||so||o < rm, 
||so|| < 1, and ||n|| < e, where 



c ^ 



16 



- + + (117) 



Then, we have P„i 1 as m 00. Moreover, the complexity 
of the algorithm is 0{m^). 

Proof: We know from Theorem |3] that P {7(710) > 7} ^• 
as m — )■ 00. Moreover, P{||A||2 > \/a. 1} — >• 1 as jti 
00 ll27l . Il26l . Therefore noting x = Aso + n we have 



'{||A^x|| < (1 + V^)2 + (1 + V^)e} ^ 1 



(118) 



as m 00, because ||so||2 < 1 and ||n|| < e. This means 
that the condition imposed by step 6 of Fig. |3] is stricter than 
that imposed by step 7 of Fig. |2] Thus, all the conditions of 
Theorem |6] also apply for the algorithm in Fig. |3] Hence, the 
Euclidean distance between the final solution and the sparsest 
solution is less than Ce, i.e. 



\sout - S0II2 < Ce 



(119) 



where C is as defined in (|96Jl. Moreover, P{C<C"} 1 
as TO 00, where C" is as defined in ( |117t . Hence, the 
accuracy is better than C'e with probability tending to 1, which 
completes the proof of the convergence result. 

From Fig. 12] it is clear that the computational complexity of 
SLO is 0{'innJL) and since n/m —> a > 0, we can assume 
n = 0{m). To obtain the final complexity result, we show that 
J = 0(1) and L = 0(1) as m 00. According to Fig. |2] 



J < 



log((Tl /O-J 



log(l + A/2) • 
From the initialization of A shown in Fig. |2] 



lim A > 

771— ^00 



/3*/(2 + 27)-r p{a)-r 



> 



(120) 



(121) 



and 

lim log(l + A/2) > 0- (122) 
Hence to show that J = 0(1), we need to show that 

lim \/rri(Ti < cg (123) 

m— >oo 

and 

lim Vmaj > 0- (124) 

m— >-oo 

To show (1123b note that 

/to ,. 1 



lim 



lim 



"^o" V^no/(2 + 27') ™->°o ^y{k + 3mA)/ 



2 1 
< -^=^= < —■ (125) 



With (Ji given in Fig.|3] ( |123t becomes an obvious conclusion 
of (fTTSb and (fT25] l. To show (fT24l i. note that from Fig. |2] we 
obtain 



lim s/^aj = ((572) lim ^/T/(T+Y) 



> ((572)v/3/4(l + 7) > 0, (126) 
where we have used the fact that 

"0/(1 + 7') = 37io/4(l + 7) + fc/4^ 1/(1 + 7') > 3/4(1 + 7). 

(127) 

Next, we show that L = 0{1). Note that from Fig. |2] 



^ -log(A/4) „ 



(128) 



log(CR') 

From ( 1121b we know that — log(A/4) is bounded. Hence, to 
complete the proof of L ^ 0(1), we need to show that 

lim log(CR') < ^ lim CR' < 1. (129) 
From the definition of AJj^j^, AJ^^^^, and k' in Fig. |2] 

«' = (7"+7')/(7'-7)- (130) 

Observe that 



7 -7 = 



no 



no 



and 



lim 



2(fc + 3toA) 2{k + 4toA) 

noTOA 
2(fc + 3mA)(fc + 4TOA) 



7' — 7 ,. TO A ,. A 



(131) 



lim 



lim 



00 1 + 7' rn^oo k + 4toA Jri^cxj k / m -\- 4A 



A A 

> — = > 0. 



(132) 



4A + r p{a) 
Also note that from (1127b . we have 

7' < 4/3(1 + 7) -1. (133) 
Then, from ( 1132b and ( 1133b one can conclude 

lim k' < 00 (134) 

and 



lim CR' = 1 - 2 lim 



1 



rn->-oo k' + 1 



< 1. 



(135) 



Remark 1. In ifTTl , we experimentally observed that the 
optimal value of i is a small constant (Fig. 5, Experiment 2, 
section IV). Here, we proved that L is bounded as to oo. 
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• Initialization: repeat initialization steps 1 • ■ ■ 17 of Fig. [2] 

• For j = I, . . . , J: 

V) (T (Jj. 

2) If j > 2, Sj.i ^ Sj_i,i. If i = 1, Si,i ^ A^X 

3) For Z = 1,. . . ,L - 1: 

- Sjj+i ^ Sjj +/*<72DrDVF<,|s^ , 

• Output is Sou* •(- Sj_L. 



conditions were derived in terms of the lower asymmetric 
RIC and Eucleadian norm of the system. We then adapted 
the convergence results for the special case where the system 
is a large Gaussian matrix. Next, we showed that convergence 
of SLO can be similarly guaranteed in the case of noise. The 
noise results combined with our previous work and numerical 
experiments presented in ifTTl indicate that SLO exhibits 
good robustness properties in noise. Lastly, we provided the 
complete parameter setting of SLO, that guaranteed recovery 
of the sparsest solutions in the case of general as well as 
Gaussian system. We then extended the SLO algorithm to 
the case of multiple measurement vectors and provided the 
necessary parameter settings for the convergence. 

Also presented were computational complexity results for 
SLO in the cases of single and multiple measurement vec- 
tors. We showed that in the limiting case m — > oo and 
7i/to — )- a > 0, the complexity is 0{m^) and is comparable 
to that of orthogonal MP techniques. Further, we showed 
that recovering multiple sparse solutions simultaneously by 
using MSLO reduces complexity per individual solution to 

The main purpose of the presented results is to fulfill 
the need for theoretical justification of SLO. A number of 
papers have stated that RIP provides a strict condition for 
analysis of sparse recovery algorithms and it typically leads to 
unnecessarily pessimistic choices for the theoretical parameter 
values. Our empirical findings in ITTl confirm this assessment 
in the case of SLO as well. We have observed fast convergence 
with excellent empirical recovery rates under weaker sufficient 
conditions than those that can be obtained from an ARIP 
analysis. 
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C. Multiple Sparse Solution Recovery Case 

Thus far, we discussed the recovery of the sparsest solution 
of USLE containing a single measurement vector. In SCA 
applications one deals with multiple measurement vectors. 

The resulting system of equations can be written in matrix 
form: 

X = AS + N, (136) 

where X ^ [x(l), . . . , x(T)] e M"^^, S ^ 
[s(l),...,s(T)] e M^^"^ and N = [n(l), . . . , n(r)] e 
jjnxT observed in Experiment 6 of ifTTl . when we apply 
the MSLO (SLO for multiple sparse recovery) of Fig. H] the 
overall computational complexity reduces as compared to 
T separate applications of the vector version of SLO. The 
following theorem supports this observation. 

Theorem 8: Under the conditions of Theorem |2l using the 
algorithm shown in Fig. |4] to recover the sparsest solutions 
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Analogous to approach of Experiment 6 in ifTTl . we use the 
matrix form ( 11361 ). We replace the final loop with steps shown 
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Winograd algorithm |[34| . The overall complexity is T/m 
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that per sample complexity is 0(to^ •^^^). 



VII. Conclusion 

We had recently proposed the SLO algorithm, which we 
showed empirically to be efficient and accurate for recovery of 
sparse solutions using t'-^ minimization IfTTl . Its convergence 
properties, however, were only partially analyzed, so the 
theoretical justification for SLO remained incomplete. The 
current paper provides the theoretical justification for SLO. 

Several results were presented. First, general results were 
derived showing that a judicial choice of parameter values 
guarantees that SLO converges to the sparsest solution provided 
that the given system satisfies the recovery conditions. These 
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